GoBike Data Analysis

Goals

In this analysis, I aim to uncover any patterns, trends, or insights from a spreadsheet containing 519,000+ rows of data pertaining to bike rentals. The data contains basic information like:

  • Start & End times
  • Start & End locations & IDs
  • Bike IDs
  • User information - gender, birth year, customer type

Some Questions Worth Answering

  • Customer demographics
    • What age group uses bikes more often?
    • Which gender rents more bikes?
  • Ride statistics
    • Average ride length?
    • When were the most popular times to rent a bike?
  • Location demographics
    • Do any stations see more traffic?
    • What are the least used stations?
  • Miscellaneous
    • Which bike was used the most/least?

Data Exploration

duration_sec start_time end_time start_station_id start_station_name start_station_latitude start_station_longitude end_station_id end_station_name end_station_latitude end_station_longitude bike_id user_type member_birth_year member_gender
0 80110 12/31/2021 16:57 1/1/2022 15:12 74 Laguna St at Hayes St 37.776435 -122.426244 43 San Francisco Public Library (Grove St at Hyde... 37.778768 -122.415929 96 Customer 1987.0 Male
1 78800 12/31/2021 15:56 1/1/2022 13:49 284 Yerba Buena Center for the Arts (Howard St at ... 37.784872 -122.400876 96 Dolores St at 15th St 37.766210 -122.426614 88 Customer 1965.0 Female
2 45768 12/31/2021 22:45 1/1/2022 11:28 245 Downtown Berkeley BART 37.870348 -122.267764 245 Downtown Berkeley BART 37.870348 -122.267764 1094 Customer NaN NaN
3 62172 12/31/2021 17:31 1/1/2022 10:47 60 8th St at Ringold St 37.774520 -122.409449 5 Powell St BART Station (Market St at 5th St) 37.783899 -122.408445 2831 Customer NaN NaN
4 43603 12/31/2021 14:23 1/1/2022 2:29 239 Bancroft Way at Telegraph Ave 37.868813 -122.258764 247 Fulton St at Bancroft Way 37.867789 -122.265896 3167 Subscriber 1997.0 Female
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 519700 entries, 0 to 519699
Data columns (total 15 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   duration_sec             519700 non-null  int64  
 1   start_time               519700 non-null  object 
 2   end_time                 519700 non-null  object 
 3   start_station_id         519700 non-null  int64  
 4   start_station_name       519700 non-null  object 
 5   start_station_latitude   519700 non-null  float64
 6   start_station_longitude  519700 non-null  float64
 7   end_station_id           519700 non-null  int64  
 8   end_station_name         519700 non-null  object 
 9   end_station_latitude     519700 non-null  float64
 10  end_station_longitude    519700 non-null  float64
 11  bike_id                  519700 non-null  int64  
 12  user_type                519700 non-null  object 
 13  member_birth_year        453159 non-null  float64
 14  member_gender            453238 non-null  object 
dtypes: float64(5), int64(4), object(6)
memory usage: 59.5+ MB
duration_sec start_station_id start_station_latitude start_station_longitude end_station_id end_station_latitude end_station_longitude bike_id member_birth_year
count 519700.000000 519700.000000 519700.000000 519700.000000 519700.000000 519700.000000 519700.000000 519700.000000 453159.000000
mean 1099.009521 95.034245 37.771653 -122.363927 92.184041 37.771844 -122.363236 1672.533079 1980.404787
std 3444.146451 86.083078 0.086305 0.105573 84.969491 0.086224 0.105122 971.356959 10.513488
min 61.000000 3.000000 37.317298 -122.444293 3.000000 37.317298 -122.444293 10.000000 1886.000000
25% 382.000000 24.000000 37.773492 -122.411726 23.000000 37.774520 -122.410345 787.000000 1974.000000
50% 596.000000 67.000000 37.783521 -122.398870 66.000000 37.783830 -122.398525 1728.500000 1983.000000
75% 938.000000 139.000000 37.795392 -122.391034 134.000000 37.795392 -122.391034 2520.000000 1988.000000
max 86369.000000 340.000000 37.880222 -121.874119 340.000000 37.880222 -121.874119 3733.000000 1999.000000

Observations

There's a lot of data we can take away from the above:

  • Duration: The average bike ride was ~1100 seconds, or just over 18 minutes
  • Birth Year: The average birth year is 1980, and 75% of customers were born before 1988
    • Outliers: The oldest birth year is reported as 1886 - user error?
  • Gender: There are roughly ~3.5x as men male users as female/other users

Checking for Missing Data

As we can see below, two columns are missing ~12.8% of the data. For simplicity's sake, I removed these entries.

column %_missing #_rows
0 member_birth_year 12.80 66541.0
1 member_gender 12.79 66462.0

Customer Demographics

Single-Time Users vs. GoBike Subscribers

Age & Gender Distribution

Count
member_gender
Female 98542
Male 348318
Other 6299

Age vs. Ride Duration

Observations

  • Men between 25-44 are the main customers
  • Only ~7,500 people between 18-24 rent bikes - a bit surprising?
  • People between 30s-50s tend ride for 12-15 minutes - commuting to/from work?
  • Younger folk and seniors have more varied ride durations - less commitments, more free time, etc.

Date & Time

Earliest date: 2021-06-28 09:47:00
  Latest date: 2021-12-31 23:59:00
Label Time
Early Morning 3:00 - 5:59am
Morning 6:00 - 11:59am
Afternoon 12:00pm - 4:59pm
Evening 5:00 - 8:59pm
Night 9:00 - 11:59pm
Late Night 12:00am - 2:59am

Observations

  • Fall seems to be the most popular time to rent bikes - temperatures cool down, but not too chilly yet
  • Mornings and evenings see the most rentals for each month - probably due to commute?
  • Very few night/late night rentals - weather & safety concerns?

Observations

  • Most of the traffic is in San Francisco - Market Street
    • More rentals took place near BART stations, and the water - tourism?
  • Oakland saw significantly less rentals - maybe due to locals and less tourists?

Summary

  • People between 25-44 tend to rent bikes more often, but ride them for shorter durations.
  • People outside that age range tend to rent bikes less often, but ride them for longer durations.
  • Mornings and evenings see the most rentals, probably due to people's work commutes.
  • More bike rentals occur as summer becomes fall, but slowly drop as fall becomes winter.
  • The bike stations closer to major points of interest (BART stations, popular tourist areas) see more rentals.